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Amendments t the Specificati n: 

Please replace the paragraph [0002] with the following rewritten paragraph [0002]: 

[0002] Advances in semiconductor technology have created reasonably-priced chips with 
literally hundreds of millions of transistors. This transistor budget has revealed the lack of 
scalability of both multi-issue uni-processor architectures, such as instruction level parallelism 
(ILP) (superscalar and VLIW), and of the classic vector architecture. The most common use for 
the increased transistor budget in CPU designs has been to increase the amount of on-chip 
cache. The performance increase in such CPUs, however, soon reachod th e point of d i m i nishing 
fetwes is slowing down . On the other hand, main applications are shifting to multimedia 
operation, and the chip multiprocessor is beginning to make a mark because of its exploitation 
of parallelism. 

Please replace the paragraph [0003] with the following rewritten paragraph [0003]: 

[0003] As semiconductor design rules shrank, some scaling problems began to appear. 
Wire delays have failed to scale. This issue has been postponed for about one silicon process 
generation by moving to copper interconnects and low-k dielectrics. But CPU designers already 
know they should no longer expect a signal to propagate completely across a standard-sized die 
within a single clock tic. Stiefr-A solution of such scaling problems arc driving CPU des i gners 
te is a chip multi-processorlTsll , integrating small CPUs on a die . 

Please replace the paragraph [0004] with the following rewritten paragraph [0004]: 

[0004] Another factor driving the partitioning of the single, monolithic CPU is bypass 
loqie excessive deep pipelining . As CPU architects add more stages to their pipelines to increase 
speed and more instruction issues to their ILP architectures to increase instructions-per-clock, 
the bypass logic* that routes partial results bock to ear li er stages i n the pipeline undergoes a 
combinator i c exp l os i on, wh i ch i nd i cat e s that the number of p i pe li n e s tages has some optimum 
at a modest number of stagcs logic and complicated control logic are approaching their 
allowable limit . 
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Please replace the paragraph [0103] with the following rewritten paragraph [0103]: 

[0103] If willing to give up on using the full bus bandwidth for 32-bit transfers, the 
above may be simplified. In such an embodiment of the invention, a 32-bit write to a FIFO may 
be zero-filled to 64-bits and transferred as a 64-bit value. A 32-bit read from a FIFO may read 
the full 64 bits and only use the LSW. This coalesces the above cases. The decoder may 
prevent the pairing of instructions that both try to read from or both try to write to the same 
FIFO. Also, the decoder may prevent the next instruction from being paired with its 
predecessor, if that next instruction reads multiple values from the same FIFO. Deadlock is 
then avoided. An align instruction would not be needed. In general, an embodiment of a 
decoupled architecture has been described. It will be appreciated that four cross-bar DRAM 
clusters are also applicable to other chip-multiprocessor architectures. In addition, four cross- 
bar DRAM clusters are not always needed, and depend on the required resources. 

Please replace the paragraph [0122] with the following rewritten paragraph [0122]: 

[0122] The following related applications ore bc i ng have been filed on the some day as 
th i s app li cat i on March 31, 2003 (each having the same inventors): 

VECTOR INSTRUCTIONS COMPOSED FROM SCALAR INSTRUCTIONS assigned Application No. 
10/403,241 : TABLE LOOKUP INSTRUCTION FOR PROCESSORS USING TABLES IN LOCAL 
MEMOR Y assigned Application No. 10/403,209 : VIRTUAL DOUBLE WIDTH ACCUMULATORS FOR 
VECTOR PROCESSING assigned Application No. 10/403,315 : and CPU DATAPATHS AND LOCAL 
MEMORY THAT EXECUTES EITHER VECTOR OR SUPERSCALAR INSTRUCTIONS assigned 
Application No. 10/403,216 . 

The following related application has been filed on March 7, 2003 (having the same inventors): 

LOCAL MEMORY WITH OWNERSHIP THAT IS TRANSFERABLE BETWEEN NEIGHBORING 
PROCESSORS assigned Application No. 10/384,198. 
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